With this project, we aim to present a simple yet cohesive and concluding approach to one of the most relevant application fields in Data Science: smart city planning. For this purpose, we targeted a relatively small and simple dataset that contains all traffic violations since 2012 in Montgomery County, Maryland. This dataset, though simple, provides very accurate and descriptive information on the nature of the traffic violations. In particular, we will focus on traffic violations with a specific nature: alcohol consumption-driven traffic violations. We chose this particular subset in order to provide sound and conclusive insight on how to potentially reduce those traffic violations, which are responsible for a significant amount of deaths and injuries.

This project is divided in three sections: in the first one, we will provide some preliminary information in order to describe the dataset and explore how traffic violations are distributed considering different dimensions. Subsequently, we will proceed to overlay traffic violations with bars serving alcohol, as a means to show potential explanations to the nature and number of these traffic violations. Similarly, we will also overlay metropolitan transportation stops in order to assess the relative proximity of these stops to the bars whose attendants seem to incur in a high number of traffic violations. Finally, we will provide conclusions and guidelines on possible means to optimise the public transportation stop layout and transportation frequency in order to possible reduce the number of traffic violations caused by alcohol consuption.

The Montgomery County Data Set

From the entire dataset, and as depicted in the plot below, we will only focus on alcohol-induced traffic violations, which only constitute \(3.6\%\) of the entire dataset. Even though this proportion might seem small, in subsequent sections we will show that the volume of data is adequate to provide insight on the current situation in the county of Montgomery.

As the plot below shows, the number of traffic violations triggered by alcohol consumption has been steadily increasing over year, and the trend for 2016 seems to go in the same direction. Consequently, we consider that tackling ways to reduce this number is not only reasonable but also desired.

In order to understand the nature of these traffic violations, we decided to analyse the time of occurrence of this violations, considering three different axes: day, time of the day and the combination of both axes (time of the day over each day, displayed as a trellis plot). All three plots are displayed below:

It can clearly be seen that late night and early morning hours present the highest proportion of traffic violations, which immediately suggest nightlife activity as the main root for these traffic violations. This is also supported by the following plot, which shows that most traffic violations occur during weekend days (that is, Friday, Saturday and Sunday).

Since this information is not enough to actually conclude that this global pattern is also local, that is, that there is no special day where traffic violations occur at night, we decided to display a Trellis plot that breaks the previous information on a day-by-day basis:

We can clearly see, then, that this behaviour pattern (traffic violations occuring during late night and early morning hours) is repeated throughout the entire week, almost the highest proportion can be found in weekend days. As a final step, it is important to be able to discern if the pattern occurs during the entire year. If so, then we can actually conclude that nightlife during weekends is indeed the main root of these violations.


In the plot above, even if we see a slight increase in traffic violations during the months of November and December, the number of violations per month do not differ significantly. Hence, we can conclude that applying measures during the weekends will take effect the entire year, which is a more than desirable characteristic for any measures we can suggest.


Geographical Exploration of Traffic Violations

The second part of our story focuses on geographical aspects of the phenomenon we are exploring. At first, we wanted to take a look at the distribution of the traffic violations among the administrative territorial districts that are called police districts in the dataset we are working with. The are 7 main districts in Montgomery county and, obviously, they demonstrate different frequency of violations — both overall and alcohol-related ones.

The direct comparison displayed by the chart above demonstrates two main take-aways that we are going to use later on:

We dive deeper into exploring the geographical structure of alcohol violations distribution by plotting them on a map:

The very first thing that we directly observe from this map is the fact that violations tend to cluster around certain points. What is more, each district has their own centers of gravity, which we will try to discover further on.

As a possible preliminary explanation, this clustering of traffic violations could be related to the locations of bars, pubs and other drinking houses in the area. As a sidenote, we obtained these locations by scraping the public Yelp API. Especifically, we queries for bars and restaurants in Montgomery county, Maryland, with alcohol as a keyword. Additionally, subway station locations have been added to the map in order to better understand commuting patterns in the area.

Black diamonds represent public places, where the size of each diamond stands for the rating of this place on Yelp — the proxy variable for the popularity of the place that we decided to use in our analysis.

In general, the map suggests that those violation clusters indeed correlate with certain popular public places and transportation stations. A closer look at different areas provides additional insight:

Throughout these maps we actually find 3 different clusters based on the objects around them:

  1. Clusters with bars around
  2. Clusters with bars and subway stops around
  3. Unidentified Clusters